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TABLE 2: Nearest Neighbor (BlastN v. Genbank?) 




tAB014599 



"VT |'AL050069 
AFT3296r 



AF1 34983 



AF1 00753 
"AB015982 



Z81037 



H5i^iHli^iif^i^<iAA06^^ 



H^o sapie7iiXGr^9'protiirr 



MH^li^UiaUrif^i?^^ regulator- 
of proteolysis 



-Homo sapiens ancient ubiquitous 46W 

protein AUP1 ^ 

H^^^^^li^serine/threomniT^iHi^ 

Homo sapiens type I collagen 



98.496 



6047 



Genefinder; Weak similarity in N-termmus 
to UNC-42 (WP:F58E6.1); cDNA EST 
EMBL:Z14323 comes from this gene 



181 

loT" 



99.888 
"96:429" 




213 




AB00233O 



AF083419 



AB020649 



1920 



AF1 23880 



1921 



1922 



1923 



1924 



1925 



1926 



1927 



1928 



AB002306 
X06745 



AF028827 



X71428 



AF093250 



AF151877 
"APi65124" 



-HHS^^Ti^iif^utative progesterSFi" 

binding protein 

Homo sapiens KlAAOii/ 



H^F^olipienslalc^^ 
dependent protein kinase II beta e subunit 

H5f^^51ipii?^rKiAAO^ 



li^dti^iilEiiFSii^^ 

element unknown protein U5/1 



cyclase (S56776) 
Homo sapiens yio\V 



receptor gamma 2 




H5S^li[^iif^i1<iAA^ 
H5Si^ii^iifiiWp5l7mera alpha- 
subunit (AA 1 - 1462) 



"H^f^iSlii^s Tax interaction ^?StiF40 



H^j^^^l^^i^US gycline rich prS 
-H^^^^^Tii^iiJ^ilar to yeast aaenyiate" 



■RSi^Slipiii^l-ll^ protein 



279 



TABLE 3-. NearestneighborCFas^Av^^^^^ 

— — SMITH- 



SMITH- 
WATERMAN 
SCORE 



IDENTITY 




"AB012223 
X53581 
L27428 
U93568 
Y12713 
1^F003535 
AF123881 



AF123 881 



"AB022046 

AF015539 
~Y12713 
AL032660 

AJ005073 



AL117237' 



U93570 



U09116 



M34651 
L76559 



jjanxa j-cLi...-^ — — 

Partus norvegicusORF4 

retrovxru 

element_gagpoliEro^^ 
^^^oEIilT^^li^^^i^^^^^^^^ retrovir 
-^ynops pyrrhogaster aipnal type 
col lagen . 

Genef inder 

Mus m\isculus Alix 



223 
227 
>85 
252 
549 



Mus mii sculus Alix . ~ 



Homo sapiens 0RF2, 

transcriptasejiomolog - 

^«^o nNR noivmerase alp 



265 
367 
930 



2666 
523 



159 
271 



T.^^cnp hila meianog ^ ^^' "n..,.. ' 

flubunit (AA 1 - 1462) 



382 



68.317 



280 







529 


98.718 1 


10284 

10286 
10287 

10288 

10289 


AF009329 

AL049558 

L15309 

Z98596 

U13152 

X93302 


and naxry- 

relatedprotein_2 rCTZ.^^i nrotein 

^^^^^^^^^^^ 

binding protein beta 5 -^^,-7^^ 

c,..v,«.omvces cerevisxae r»..x. ^^..^ 


267 
237 
302 

362 

173 


37.725 
85.714 
55-000 

53.636 

35.200 



548 



TABLE 4 




SEQ ID NOS: 



IZ 65 71 740 6 835 84S 852 867 

640-641 654 692 ij.o7 1289 1302 

^tLfUrts 17 822-1823 1846 1927 

1317 1464 1492 1679 1 ^BlB 2820 

1945 2039 2164 2217 2372 2 ^^^^ ^^^^ ^^^^ 

2823 2834-2835 2862 287/ ^978 

III' "Srfofo - 3 3045 30e8 3068_ 

321: 35^3 nil S -ri f4ra 

3515 3534 3536 3542 3543 ^^^^ | 

3--3f9? r7ro ? - -S 3762 3,79 

: ! r3 r, ^6 

Zl -iro 50 4178 4193 4228 
4243 4247 4252 4254 425S 4264 4274 42 

:3ro till :«r:4 - 4421 4423 44. 

r4e8 r4"2-::9! - 4518 4526- 

:"i r="o ^5 ii:? « - t^fe 

-87 r7"-:7r8 fsi^i - 
^4 - 1- 

^1^3 li" - 5259 

"fo ^4^9 f4r2 I - - 

i6!niri i;;;- I'^i 

r79l Tsfo ironi - - 5863 5368 

ir^ - roi7 rof7 -1 

6114 6131 6134 6143 6145 6153 6157 Jl 

6321 6323 6325 6327 bj^o f-.g-,.6232 6407 6414 
S 6 

1-7-/61:1 11^3 6 - 65,2 6S75 6610 6618 



549 



r 



U^00_1G67^ 



679 




U20286 
1^3016092 
j AJ243460 
X61296 
M22334 
AL033534 
U93570 
AF010144 
1 Y00706 



AL04980r 
"U49973 



AB01760T 
| _X89430 
rAF003535 



li5;^Ii^iini^iuronal_t^ 

-:: — : — r j«^o aTr)ai-chaxn ■ 



similar to poa o element -r-. — 



Haliotis rr,a binding protein 2 

Homo sapiensjnethyl_Cp^ 

— : ^ -i f <=-ra collagen 



H omo sapi t^ii° 1-1 ^^^r. 

i;i^Ii"^iiui^iI^£i£££-^^Ht2^J^ 

Rattus n^ve gi£Hg_DRgIi^ 



59.028 



945 



3775 R 

3776 ^ 

3777 ; 


£001793 1 
80344 i 
0.117203 C 
I 


; — r- 5 

^m^narv orediCtion ■ _^„j~^„H-ina 

'^^^^^^^l^i^i^rKTln^^ns predxcted coding 


6 5 
32 4 
2 4 

37 : 


8.333 
3.939 
5.161 

$7,500 


3778 2 


\E001098 
^F085753 


Erimystax dissimilis wiuj" h 


84 


32.609 


3779 






95 


38.889 


3780 
3781 


U60669 
U60669 




99 
106 


39.189 
56.000 


3782 
3783 


M18327 
U15370 




79 
82 


45.000 
27 .500 


3784 
3785 


AB001684 
Z72511 


unknown possible "^^^^ ^^3. ^DNA EST 
EMBL:M89115 cotnes from thxs gene. 


1343 
234 


57.576 
66 .667 


3786 
3787 


U09453 
Z93386 


EMBL:D715 TTnP-w-acetylglucosamine : 


180 
785 


38.889 

91.270 
87.379 


3788 
3789 
3790 
3791 

3792 


X58906 
AL050015 
U19729 
Z81489 

U80448 
Z37166 


Homo sapiens SimiiaiiLy ^-^ fj-om this 
CI °SW:P51610); cDNA EST yk205a3.5 comes tro 

__gSIl -:rx-?;;rb7"c~~iiig^i^^~cD^ 

unknown coded ror by c. ciuj coded for by 


606 
458 
1055 

1112 

2817 
2437 


100.000 
40.000 

1 54.493 

100.000 
100.000 


379: 
[379^ 
[379 

1 379 


1 X04494 
5 AF01581 

r AJ00739 


^^^^^^^^^^ 


1900 
3166 


96 . 564 
100.000 



1031 



TABLES 



C=cysteine, ^^=^^PJ^^^^^3^' H=Histidine, 
F=Phenylalanxne, G=°^yc-e. ^.^^^ M=Methionine , 

N=Asparagine, P=Proiine, ^ w=Tryptophan, 
S.serine, T.™-; l^^U.^.T^^ssi.^^^ 
I:r/oSd:' aeI~Vpoesme nucleotide 
insertion) 



A***GLLPPRWL 



SG LPDPRDP — =r^„„„. , * nGGPREGLSMPGRSPGQGYLRAAHLS 



.GQKQIAGLGVIINVGT^ 



SSSYNSSST] 
T.NH*CHMYCIPKF' 



;^PSLLLPLNYYPSPASPP 



^i^5^SYPSLPSLLTSPLPLPLP^^^ 



TSI^FQQQTPSTTOP^g^^^^-^ 



GLL*RTVTGDETGLFQYDPENKAQ/SKPRGGSGF 



^iSsSGQFSAVraTKDPPDTSRPR 



HTPSKDPIGMfflPGSITQNPSSUtsui^--^— ^ 




\VMCTWHDHKETFI.TKY1.YEI»TR^- 



1176 



VSSHDSGFISQDAFQSKSPSPMPPEAPNQ] 



RGTWDRUWAFNLYDI^GCITKEBm^^ 

REHVESFFQKMDRNKDGWTir 



2114 



TABLE 9 



TISSUE 
ORIGIN 



RNA SOURCE 



HYSEQ 
LIBRARY 
NAME 



AB3001 



- fJI - - b 

3398 3403 3420 3428 3431 3439 3490 3 ^^^^ 

3571 3753 3814 3822 3854 3956 3994 4 

4101 4115 4159 4167 4196 4228-4229 423 ^^^^ 

4246 4259 4263 4267 4296 4338 4356 ^^^^ 

4407 4470 4494 4502 4521 4529 4549 ^ 

4647 4653 4661 4679 4681 4705 4744 4 ^^^^ 

4805 4867 4869 4900 4902-4903 4913 ^^^^ 

5073 5154 5208 5245 5249 5261 52 ^^^^ 

- i 5 £ ^3 efe? f^r. 

6119 6147 6156-6157 6159 ^ g224 6226 

6196 6200 6203 6215-6216 6221 6^ ^^^^ 

"s^o etii -fe 
r.3° r/s! r4i f,f. - ..s .,0 

6785 6810 6842 6844 6854 686B 6|8° 68 ^^^^ 

6900 6909 6918 6921 6932 6956 6957 ^^^^ 

7030 7034 7041 7045 7064 7073 7088 7 ^^^^ 

7125 7136 7163-716S 7172 7211 ^^^^ ^^^^ 

7310 7335 7365 7379 7405 ^^^^ 
7590 7599 7605 7608 7613 7618 7630 

7642 7644 7646-7647 7649 7654 -^^^^ J ^^33 
7699 7711 7719 7721 7726 7727 773I 

III - l^J^ 

"^^^^^^^^^ 

8408 8411 8427 8434 8448 8451 846 ^^^^ ^^^^ 
8528 8545 8547 8560 8565 8580 B 
3643 8671 8683 8777 8795 1 8820 8^^^ ^^^^ 

T.l 9188 9201 922I 9226 9260 9420 9431 
9077 9142 9188 9201 »^ q733_9734 9745 9811 

10398 10404 10407 . " 



2115 



TABLE 9 









;5iT^4-9555 9834 983» 9941 "52 1°°" 
:"o4 10125 10133 10167 10235 10339-10340 


luterus 


Clontech 


UTROOl 


iiiiillil 
iiiiiiiii 

■■111 

issssiiii 

10344 10358 10375 10407 . 1 







2196 



DESCRIPTION 




SMITH- 
WATERMAN 
SCORE 



X13783 



Z81503 



-^^^^-^^^^^^^fy^^llT^l.^sli^ comes 

similar to collagen; cdna liai _ 



34.043 



36.190 _ 
38.710 




: : — ~r~r7m?TPP isome 



V01201 
X17025 



n.cn-uv'" — =1 ,,_ac.t- IPP ISOnieiao^ 

pj,^2ii:i:iiiv_E?^^^^^^°" ' 

■ ^^^Ai=-na hitin 




27.381 

" 82.558 
70.732 



coxii 



iP^iiiiiiriiii^iSiiiiEii^^ 



35.780 



2197 




10396 



10397 



AL109636" 



10393 



1039! 



10400 
10401 



10402 
10403 



10404 



10405 
10406 
10407 



D10376 



l^^j^^^T^iil^^Hh^drial adenylate kinase 
isozyme 3 



A01771 



Z99162 
AF131851 



M55542 



AF026528 



AL021106 



AF132157 
"AE000717 



10410 



Z48334 
AC006955 
"U70669 



protein — — 

'schizosacch=='romvces pombe 
Homo sapiens Unknown 



r,-.^-„c. nnrv eqicu s s t athmin-UKe pio 

version: ""084"") ; ^ 



version: uo-« / - 
/prediction=(metho^^ 

-^-—zr.y,^A^^^ ^lP.aans F10B5 . 8 ^ 



Wfexlaeoli^usjiy^^ 
"Sii^^^^hibdltiililiian^^ 

-r— -r~::rrTr;7r~^ais^niiin . 



2092 
137 



1215 



-^120102 I H(^ ^^^r^Piens calsenilin 



2438 



TABLE 11 




"'.SSS, «.M«hionine, /^^P^^f - serine, 
P-Proline, Q=Glutamine, R=Arginine, 

nucleotide insertion) 



-TXKWWKAYDKENLFCEEG^Tr7>?>qN 



HADLCTLSDKDRPi i ift^- • 
- . — ^TTr.T^tTr,T.aTaHl.GIY ' PREMKT' 




„.=TAFV/«'VFLVHKFK 



2439 



TABLE 11 



iNKEDYDLSKMEDFIN 



KTDKreSKTEAYI.EAlKlU.lKKi-KKHDKKGl 



^^^^^^^^ 



====== 

EFSQDTF^mJ^IFFKAKWraPFSRYQTQK^^^^^ ^^^^ 
EAALQPQTLRKWGQLLLPSLLD^LPR^ 
^^^^^^^^^^^^^^^ 



3076 



TABLE 12 




SEQ ID NOS: 



1354 1501 1988 2153 2156 2158 
2362 2380 2528 2586 2895 ^^^^ ^^^^ 

4049 4067 4218 422b *^ 4778-4779 4781 
4428 4439 4698 4702 4753 4778 4 
4786 4798 4879 4899 4977 5074 51 
5155 5201 5209-5210 5307 5520 55^^ ^^^^ 
5699 5725 5768 5944 5952 598 ^^^^ 
6223-6226 6229 6268 6361 6390 b 
6847 6949-6952 7037 7116 7228 75 ^^^^ 
7553 7555-7559 7561 7564 7637 
7852 7924 7930 8095 8125 8145 81 
8200 8284 8286 8358 8433 f ^^^^ 

'''' IIW 9 4'r079 9108 9146 9195 

8902-8903 8935 8964 90/y ^^^^ 

'.S 98 5-9 71 1 002 10037 10073 
9694 9807 9854 9865 as /x ^^^^^ 
10373 10407 10550-10551 10661 10 ^^^^^ 

112:0 Siril 501 11923 11970 

i^^^r-oi2:;oS^^ 

13052 13085-13087 13141 13204 1 
13282 13291 13294 13 "13 1339^ ^^^^^ 
13600 13624 13635 13637 l^^ 53 ^3785 

13650 13652 13655 13715 13/^ 

3r.s s - :i - 3"/o- 

Tof. .862 3879 4030 4043 4049 4064 
3827 3833 3862 38 /a ^ ^^70 
4079 4081-4082 4084 4102 4218 42 
4277 4301 4307 4419 ^^J^ f ^^^2 4841 
4652 4671 4685 4720 4782 47^ ^ 
4871 4879 4881-4883 4886 4892 ^^^^ 



3077 



TABLE 12 









T7^?rr2559-12560 12734 12742 12862-12«b^ 
1295I-I29I5 13033 13038-13039 13052 13059 
13^02 "105 13110 13114 13127 13168-13169 
13448-13450 13515-13516 13640 13832 13879 
17889 13942-13948 13970_14054 


uterus 


Clontech 


UTROOl 


^^^rT^25S 1332 1501 lB03-ibi9 
^630 1679 1940 2160 2281 2326 2343 2360 
2384 2409-2414 2572 3304 3502 3743 3780- 
3783 3785 3993-4010 4112-4114 4216 4225 
4270 4^01 4419-4420 4464 4545 5035 5039 
5222 5339-5340 5520 5532-5545 5699 5739 
^756 5840 5987 6169 6529 6633-6641 6649 
6925 7060 7101-7102 7371 7544 7955-7959 
ITol-lAs 8117-8119 8121 8170 8206-8207 
8212 8286 8322 8402 8684 8739 9075 9159 
92^4 9321-9326 9403 9439 9483 9535 9608 
9774 10098-10099 10203-10209 10425 1067 
10744-10752 10790-10792 11240 11336 11387 
W.IS Il477-ll484 11797 11936 12484-12486 
12488-12191 ^2561-12562 12690 12796 12820 
129I6-I2958 13029 13049 13061 13141 13186 
13^99 lllll 13223-13224 13271 13402 13419 
13451-13453 13517-13519 13537 13635 13655 

' 879 13949 13951^954 1403^ 14037 J 



3122 



TABLE 13 





3MITH- % 


IDENTITY 1 


SEQ AC 
ID 
NO: 

1 M3 

2 AO 

3 VO 

4 XC 

5 M] 

6 M 


iSiiiiBSl DESCRIPTION 
NUMBER 


fVTERMAN 
SCORE 




nfifio Homo sapiens ATfaseJ . 

^--^^j^E^:^!^^^^^ 

^0 — 

&riig^E^^2^IP££^^ 


222 
332 
319 

714 
213 
163 
169 


64.423 
64.583 
80.282 
36.196 
42.391 
30 .682 
80.392 


7 A 

8 X 

9 A 

10 D 

11 X 

12 A 

13 I 

14 I 


gigMi^s:^!^ — 

^^I^IILI^^^^^^^V'.Lr.^^me A'transporter ' 



-—T^^ Li==mr>rtiiim vivax pval . 

i^^Tiirini^^IapieirKI^ 

fi^ii — ^^J^i^ESnE^^^^^^ 


742 
171 
818 
167 

289 

166 

374 _ 

638 


30 .872 
85.43 
35.443 
80 . 597 
33.01 
55.085 
74.15 


15 I 
' 16 
17 


^^^^^^n^ — 

'I^e nT;F2 encodes a reverse 

T09116 Homo sapiens 0RF2, encuu 

NTP ■ 


248 
162 
241 


37.778 
36.364 


18 
19 

20 


kcyltrnnqf erase .u.,p^^e polyprotein 


" 737 

767 

225 
211 


80 

85.417 

48.227 
32.331 


21 
22 

22 
2 


caenorhabditis elegans cDNA bbi Y 

^ comes from this gene; cDNA EST yk452h4.5 

-Fr-r,™ f.his aene 

r° ■ -.^^ oRpi- MER37; putative 

^Si^r to p^o element 

NTP — ' 


198 
32] 


66 .897 
72.34 
50.394 


2 
2 

2 
2 


5^^l3452jH^^^^jg^^ inhibit^ 
fuomn saniens inter-alpha-trypsin ""^^ 

EMBL -.TOO? 05 comes from 


T 78: 

38 
17 

16 


86.395 

J 61.475 
7^ 37.705 

7 34.959 




^^=^^^^^^^^^'' 

Ltigen_2Zty^ 

33X0314rZji™Hll^^ 

3?Mi53ir:i]l^°^ 


15 

21 

21 
6 
1 
2 
6 


0 61.702 
9 30.602 

57 41.045 
36 71.324 
10 34.959 
33 sol 
721 83.704 



3123 



TABLE 13 



comes from this gene;" 
Homo sapiens GW128 



Wn mo sapxeiiB 

nnmn --j^'^r^^ seoretogranin III 
ili^S^ulus edwardii reverse 
transcriptase 



u^^n^^^^ens KIAA0401 

; ■ « li^y.^A -TfaMnooathv protei 



p^^.r. ■h^rr.esvirus 4 epxr.»H«= ' - 
^^^^j^^^^^r^;^;-!^:!^ retinopathy pro.exn 
/p-t-^rminal. clone XEH.8c}_ 



Homo sapiens FUTl 



nOllK-' oa-j^-^—- — . 

' HE>mo sapiens kaliistatin 



122 _ 
2980 



2399 

69' 

128 ' 



1128 



86.364 
43.972 



3504 



TABLE 14 



SEQ Mt 
ID t 
NO. O 


- 1 PI 
1- ec 
i ii 
n 
t 
1 
c 
P 
t 
a 
a 
I 
c 


-edict- P 
i beg- e 
ining n 
jcleo- t 
ide 1 
ocation c 
orres- 1 P 
ending t 
o first e 
mino « 
cid ^ 
-esidue 
jf amino 
icid 

secruence 


tredict- AI 
a end S 
acleo- ( 
ide E 
ocation H 
orres- 
ending P 
o last 1 
mino ^ 
icid > 
-esidue i 
Df amino 
acid 

sequence 
199 


-Histidine, I=Isoleucine. K=Lysine, 
iSucine, M=Methionine, N=Asparagxne 
-proline, Q=Glutamine. R=Arginine, S=Serine, 
iSreonine, V=Valine. W=Tryptophan , 
iSrosine, X=Unknown, *=Stop Codon, 
I^ssible nucleotide deletion, \=possxble 
lucleotide insertion) 


jl 
1 ^ 


A 
A 


2 
1 


2321 r 




1 3 


A 


2 


517 




4 


A 


471 


911 




5 


A \ 


1 


1250 


^mGISTGVLGLHI,LPSLHVFPAPPSM!MPPGESFPASGI,CPPRS 


6 


A 


31 


319 


GE^CK\lQRPCItKS™KMLPVI.VLEFRICWMFHIDELFFSFVE 


r? 


A 


125 


421 


mg^Sotdkevkfknppggdbwkepkgtsvpgtopa^ldse 


8 


A 


2 


319 




9 


A 


1 


729 


2S?S^YHAFPKQSHPEIISVLEKVIPLISDMI.E«KLTDLCTKI 


10 


A 


1 


283 


TYTY 


1 







3505 



TABLE 14 









I 
I 


FEISPSVDNFGVYIDNYHCDPNDKVSCPRTLIVRHt. lyr^v u 
.^LCA ^^,n,TT?ATPVTLCNASCYFIP'NEGVPGDSTRKCJMi; 


6381 


A 


3 


366 




6382 


A 


3 


367 




6383 
6384 


A 
A 


136 
119 


437 

226 
1862 


^^^==== 


6385 

6386 
6387 

6388 


A 

A 
A 

~ A 


1625 

394 
1177 

643 ~ 


473 
1787 

754 


Shthththt^ 

^^^^^ 

-^l^GTCPVTPPTPSD^g^ 

^^^^^^^^^^^ 


6389 
6390 


A 


76 




304 
365 




5391 


A 


3 


354 


^.KT.AVTRGVTTKELYPEFGLDMND __ 









3870 



TABLE 15 



TISSUE 
ORIGIN 



adult brain 



RNA SOURCE 



GIBCO 



HYSEQ 
LIBRARY 
NAME 



AB3001 



SEQ ID NOS: 



adult brain 



GIBCO 



ABD003 



o A n 77 140 433 508 82 8 952 1084 ll^a x.l34 
1183 1446 1555 1615 1689 1788 1803 1865 1886 
1931 1945 2031 2058 2154 2453 2689 2724 

ssiFSSississri 

nil ^ III S674 

1324 nil 11% S878 5889 5S» 5899 5905 

5913 5931 5946 5984 6022 6041 6059 

Il41-6li2 6^6-6147 6154 6222 6255 6292 6298 
63136346_6364 



\X 657 689 720 782 785 801 809 811-812 828 
2121 "453 2469 2496 2512 2534 2555 2581 

7412 3432 3434 3436 3446-3447 3453 3456 

iiiSEiiSSSs 

iiiisssiii-ii 

irjr,rs; :s ss s ss 
sra Slants s; nr. -.„; 

Sli-SiiS-Siii 

lyiiLiyiiiSi^ 



3871 



TABLE 15 



liliiiii 

^^^l ^occ fi302 6375 6380 



3914 



TABLE 16 




% 


SEQ 1 ACCESSION 
ID NUMBER 
NO: 


DESCRIPTION ^ 


SMITH- 
ATERMAN I 
SCORE 

156 


DENTITY 
44 


1U93564 tic 
2M76546 He 


n,r, s^niens putative pxa^^ . 

lianthus annuus hydroxyproixn«-.-i^ 

otein . : — ; — — 


169 
163 


36.364 
34.94 


3 Z98980 

a 

4 L21990 H 


"hizosaccharomyces ponme wiskott- 

irir-inh syndrome proteinJiomolog_l . _ 

3mo sapiens spj^eosomai_proi;«x^ 


198 
270 


32.576 
31.264 


5AF153062 C 
P 

6AF044601 H 
1 F 


rn-alDhal(I) chain . ^ 

^i^T^^i^orvhan G protexn-coupled 

eceotor; GPC-R — ; — : 


180 
175 


36.17 
44.595 


7U61953 C 
1 


aenorhabditis-elegans No definition 

■irie found : 


322 


67.708 


8 AF010144 W 
9 Y10018 I 


^^^i^-^iil^^^Tl^i^I^Si^th^ protein 

^^^fep^iiT^iri^^ia^ 


185 
140 


25.568 
33.333 


10M13100 I 
11U94492 r 
12 AF055985 


^^r^^^^i^^^^^^£2^ 

4eloidogyne javanica collagen 


163 
141 

184 


34.286 
41.667 

35.227" 


X J 1 rlo U -5 *± ^ 

14 pi0493 


product ■ 

^^;^;^j^±^^vi£^2Jii2i22::^ 

l^^^^a^^^^i^V^^ myc protein 1 


296 
318 


43.151 
48 . 062 


15pi0493 
16 M22334 
17U67056 


■ iii-iirnnvm tjrotein 

Homo sapiens unKnown y^^ , _ 

Acanthamoeba casteiianii myosin I 

heavy chain kinase — 


191 
191 

242 


55.294 
34.872 

28 .986 


X8|M80341 
19 AF003535 


*-^=,risr.riptase domain^j_ORF^ 

^^^^^-jg^s 0RF2-like_procein_ 


243 
815 


51.685 
92.537 


20 p80009 
2ip88460 
22 L02918 


Homo sapiens KIAA0187 . 

Homo sapiens N-WASP . - 

procollagen type V alpha 


87' 
14^ 

28 


88.636 
[ 35.503 

3 51.754 


23 L24521 
~24 AF003535 


- ii5^;;5-i^^ii^transf ormation-related 

protein - — ; 

-fe;mr. sapiens 0RF2-like protein 


15 
24 


0 41.096 
3 41.88 


25 AF003535 

26 AF010144 


- SS^i^-iliiii^^^^i^I^^^^^^^ 

AD7C-NTP . 


15 
14 


7 42.647" 

"s 49 .206 
4 49.315' 


27 AF003535 

28 AF003535 


Hom^_sap3^ns_0^ 


— 2C 


"7 56.164 


29 AB012223~ 

30 AF118023 

31 X77816 


"Canis familiaris 0RF2 

- 55;;;^-ii^I^SH3 domain-binding 

pv.,*-f»in SNP70 

p;,t-.t-.us norvegicus PR-Vbetal _ 

^^^^^r. hf^rpesvirus b Jb8 _ 


li 

~~ 1 
2 


^ 38.462 

52 30.508 
55 52.688 
21 52.055 


32 X83413 

33 X83413 

34 M88593 

35 X05561 


H„m;,n herpesvirus e U8U__ 

-trTTTT^iill^^ type XI collage 
-Homo sapiens alpha-l chain Precursor 

1°^° (2953 is 2nd base in 


2 

n 1 


87 28.526 
70 40.541 
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6369 A 

6370 U 


B011540 H 
22456 H 
]< 


omn s5aDiens MEGF7 

omo sapiens AMP-activated procein 

inase homolog . — : 

,auifex aeolicus hypothetical proteinl 


10912 
175 


100 
34.884 

24 


6371 P 

6372 t 

6373 I 


^000762 F 
^F159055 i 
I 

D30747 i 
>C52164 t 


lomo sapiens leucine zipper-ixke 

jrotein 

\cropora donei mini-coll<ivjen 

^us musculus Q300 protein x-N) 


123 

127 
131 


64.516 

33.333 
38 
100 


6374 . 
6375 
6376 
6377 

6378 


1^014516 

A31036 

^412623 

D29833 


ZZrrZ^TS VTAA0616 orotein 

Homo sapiens k.iakw°xo jj^y^ — 

KTicotiana alata PRP2 

Homo sapiens high mobility group 

crotein 17 — — — 

7«v,c. T-i-roline rich peptide f-e 

Homo sapiens proline xx>-ii y f 


4273 
130 
351 

75 
207 


31.579 
93.103 

36.111 
100 


6379 
6380 
6381 


270292 
M94131 
M15885 


Homo sapiens chemokine i 

Homo sapiens mucin . 

Homo sapiens seminal piasma px^Lcin 
precursor 


9126 
801 

603 


100 
99.123 

76.119 


6382 
6383 


M22865 
AF010427 


H eapiPns cvtochrome 

Hepatitis E virus uki?-1; hypervariaoxc 

region . 


118 
134 


58.621 
66.667 


6384 
6385 


AJ131190 
S79410 


uon,n sapiens FANCA protein 

Mus sp. nuclear localization sxyixd., 
(NLS) -binding protein=spot-l _ — 


16C 

ii: 


57.895 
85 


6386 
638' 

638 


AJ388513 
7 AF039052 

3 X67703 


"'Canis fnm^^-^-i« T^ibosomal prox-t^xxx LP 
-Caenorhabditis elegans No detmxLxou 

line found -— 

nrosoDhila melanogaster Msco*bu 


18: 

9 
11 


1 29.299 

5 48.148 

2 41.667 


638 
639 
639 


9 AJ388550 

0 X01779 

1 AL109822 


"canis familiaris hypothetical protein 
- u.vHo,™ vulaare C-hordein tragmeix. 
Schizosaccharomyces pombe nypothetical 
|t5rotein 


12 
11 


8 33.333 
7 26.027 
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TABLE 17 



SEQ 

ID 

NO 


Method F 
s 
I 


redicted 
tart 

lucleotide 
ocation 
for amino 
acid 

sequence 


Predicted 
stop 

nucleotide 
location 
for amino 
acid 

sequence 


corresponding SEQ ID NO. p=filiitamic Acid, 
(A=Alanine, C=Cysteine, D=Aspaitic Acid E-G utaimc Aci°' 

\=Possible nucleotide msertion) 


1|C 
2B 




23 
1 




76lM 
735 M 
G 
M 
E 
P 
D 


SGDISGNGMFLREEGIUN mLVT^^^^^^^ 




3E 




1 

104 


783 W 
P 

^ 

( 

266 


STSFSATTTCQETPPQLPICPQMF 
^ipTHTPWiPRQE^ 


4 
5 


B 
B 


1 


1113" 
1 164 


SS^pIcrlIssappqlgvptkslfpqadkqtk 




5C 
7 B 


11 


2 27? 


KiFRiQYFWVPPPT^^ ' 
i|»;GPASRCGBCGGARi^^ 




8B 
98 


19 


8 40 
1 68 


7 SiSHRAVVPPCCT^^%^ 
TGAVAVDLLKWKDMlQtjUjNL 1 iLrnm... 




oc 


1 ^ 


46 6C 


nMLLSCSSWVLVASCVVun-SDIFSGGKTKCISLGCQINSALTSI-HlALEL 
° wSSLYEiGWCSMlELNLKDAFDSFERL 
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TABLE 17 

PVNQLLTALPVDVLGSLCitLSSPWSAFALhK 



SAWGpVn/YAVDYLSPNPMVMVLGAWI 



^^^^^^^ 

?rsrRS«fQ»-K^S,TNAATAUTS^^^ 

QQSTVHRD 



^?^y'E''^SpDWAlSR^^^^ 
I?NSLV^mK?MRTPQRN^WRNG 



TLTGAAKTALGIASMHCSAJ 
LIGDDEHGWDDD 



MTINREPQRG 



PMEDGQTRGANAFYSVMEQM 



8514 



TABLE 18 



85 3487 3S92-3S95 3733 3990-3994 4108-4XX3 
4201-4203 4599-4601 4622-4623 4653-4655 
AOTS 4881 5529-5530 5649-5652 5746 5848 
6026 60 0 6X84-6185 6453-6454 6502-6503 
7263-7266 750X-7503 7506-75X4 7590-75 5 

9577-9b"b 9582-9585 9646-9649 9663-9664 
9II2-9913 99X5-99X6 9935-9956 9966-9969 
inio8 10013 X0030-X004X X0047-X0053 X0086- 

16593-16594 16709-16711 16787-16794 16801- 

ir498-i BO2 19528-19530 19S35-X9544 X9555- 
19557 X9618 19675-X9677 1968X 197X9-X9722 
19808 19824-19825 X9842-X9847 1985X-19857 
19887 19919-19923 20037-20045 20049-20054 

0059-20062 20085-20097 20X3X-20X33 2 145- 
20159 20203-202X0 20215-20218 20227-20228 
?0238-20243 20277-20280 20283-20287 20296- 
2^29? 2S305-203IO 20314-20315 20331-20334 
20397-20399 20406 20438-20441 20466-20471 
78 2S5fo-20517 20532-20534 20B54.205 7 
90641-20652 20659-20662 20671-20675 20725 
?n730 20732-20733 20764-20770 208X6-20825 
20836-20843 20845-20850 20908-209X0 20925- 
fo93X 20963-20968 2X0XX-2X012 21049 21061- 
21065 21X25-21X30 2XX53-2XX57 2XX89-21193 

°X9l2i202 2X264-2X267 l^lll'lllll " 
91281 2X400-21402 21405-21411 21538 2154j 

21567 2X613-216X9 2X623-21628 21775- 
2i783 2X857-2X86X 22057-22063 22086-22089 
'oo'n -,2102 22X06 22X7R-22-.34 22X54-22X57. 
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TABLE 18 



ri7i6^8064-18068 18122-iiI^nF635 18764- 
18767 18782-18786 18870 18898 18934-18943 
llool-lloll 19008-19012 19045-19048 19145- 

' 19148 19183-19192 19475 19483 1^88-19492 
19500-19502 19555-19557 19675-19677 19724- 
lllTo llsOS 19842-19847 19879-19882 19932- 
Islll 20042-20045 20049-20053 20066 20097 
^0141-20159 20236 20238-20243 20262-20264 
20267-20270 20406-20413 20415 20445-20447 
9^483-20485 20518-20522 20544-20548 20586- 

llTsl 22218-22219 22315-22320 22346-22352 
22 79-22380 22389-22392 22486-22487 22490- 
22492 22501 22533-22536 22699-22709 22750- 
lllll 22772-22779 22791-22792 23019-23023 
23088-23090 23102-23103 23181-23184 23228- 
,323! 23275 23560-23562 23574-23582 23678- 
23679 2416^24171 24268-24282 25223-25228 
5 8 -2B481 25515-25516 25557-25558 2636- 

----- 

r^;6^l2^?^---0l2^^ 

28600-23^02 29514-29515 29555-29556 30145- 
30146 30334-30340 . 



865S 



TABLE 19 



s 
I 


EQ 

D 

JO 


ACCESS- 
ION 

NUMBER 


DESCRIPTION 


SMITH- 
WATER 
-MAN 
SCORE 


% 

IDENTITY 






3379 A. 

3380 A 

3381 A 

3383 Y 

3384 Y 


133271 G 
J1 33271 G 
J133271 G 
12713 M 
12713 U 


)rilla gorilla atrophin-1 

orilla gorilla atrophin-1 . 

nriila aorilla atrophin-1 — 

,r rr-- «= Prn-Pol-dUTPase poiyprotein 

r^........,= Prn-Pnl-dUTPase poiyprotein 

,n mir- - Pro-Pol-dUTPase poiyprotein 


51 
149 
149 
344 
634 
369 


29. /iJ 
27.536 
27.536 
60.825 

87.85 
82.353 






3385 Y 

3386 Y 
3388 Y 
3390^ 

3391 P 

3392 \ 


12713 W 
12713 W 
12713 ^ 
L049758 V 
VL049758 h 
^09443 \ 

£ 


■ , r-i n«i A\ iTDoco nnlvorotein 

lus mijsculus Pro-HOI-gu i rdbe poiypiu^ciii 

mir-^-"- Prn-Pni-dUTPase poiyproie.n 

c=,piPnsdJ437IVI21.1 (novel protein) _ 

^n^. .opi^nc H.14:VM21.1 (novel protein)_^ 

Homo sapiens alkyl-dihydroxyaceionephuspl.«le 

ynthase precursor 


523 
467 
237 
344 
1266 


38.776 
68.807 
70 
47.682 
85.965 

92.991 






3393 > 

3394 
3395 
3396 
3397 


r09443 

»(79536 
X79536 
)(79536 
X79536 


^omo sapiens alkyl-dihydroxyacetonepliusplidlu 

synthase precursor 

[] canipns hnRNPcore protein ai 

Homo sapiens hnRNPcore protein a 1 

Homo sapiens hnRNPcore protein Ai 

unmn eapiens hnRNPcore proieinAJ 


1243 

1959 
593 
315 
1486 
8177 


91.875 
51.261 
95.652 
98.4 
100 






3398 
3399 
3400 
3401 
3402 

3403 
3404 
3406 
3407 
3405 


U34360 
U34360 
AL033534 
ALII 7470 
U22376 

LI 1672 
AB006628 
AB006628 
AL1 10124 
5 ALII 01 24 


Homo sapiens LAF-4 

wnmn sapiens LAF-4 __ 

Schizosaccharomvces oombe senne-r,d2jW=Mj 

Hnm- c='r"='"« hyDothetical protein 

Homo sapiens alternatively spliced product using exon 

13A 

Homo sapiens zinc finqer protein 

Homo sapiens KIAA0290 

Hnmn saoiens K1AA0290 

Homo sapiens hvpotheticai protein 

■ Homo sapiens hvoothetical protein , 


205 
165 
28l1 
313 

2454 
288 
183 
60^ 
215 
20^ 


59.322 
30.46 
40.764 
69.231 

46.939 
36.029 
64.216 
) 53.684 
\ 94.59£ 






340^ 
341 
341 
341 
341 
341 


3AL110124 
3 ALII 01 2^ 

1 Y'^RQ'^? 

1 AODSO^ 

2 X56932 

3 X56932 
4AF01014 


" Homo sapiens hypothetical protein 

~ H^rr^n capiftns hypothetical protein 

- H--n capiPns 23 kD highly basic proiem 

Homo sapiens 23 kD hiohlv Dasic proie:;... 

Homo sapiens 23 kU highly basic protein 

4 Homo sapiens n-.mnal thread protein AU/c-NTP 


377 
82 
50 

126 
28 
25 


3 97.87< 
D 80.66 
5 10 
0 98.51 
9 39.72 
"7 26.62 


\ 
I 
3 
5 
6 
1 




341 


5 AL13289 


B Caenorhabditis elegans preaiuieu uo...a 

nreliminarv prediction , — 








7 92.30 


8 






341 


6 Z72499 


Homo sapiens herpesvirus associaled ubiquitin-specr 

prntPase fHAUSP) _ 


C o<- 


)7 32.71 


g 




34 


"7 Z72499 


Homo sapiens herpesvirus associated ubiquiliii-speul 
protease (HAUSP) r 


c 5J 


26 99.6: 


J7 




34 


18 Z72499 


- Homo sapiens herpesvirus associated ubiquiliii-speci 
protease (HAUSP) 


Ic 73 
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TABLE 19 



30404 AF 


039023 He 
"039023 He 
;i002400 H 
bi 


7mnsaoiens Ran-GTP hincjina protein; RanBP6 

capipn« Ran-GTP binding protein; Ka.ibPb 

Dmo sapiens Gene product with similanty to UbiMUiliii 
ndina enzyme — ; 


1020 
7063 
1536 


86.188 
97.922 
95.102 

90.741 


30406 A 


3002400 H 
b 


omo sapiens Gene product with similanty lo UbiMUiliii 
nding enzyme — 


341 


98.876 




C002400 H 
b 


omo sapiens Gene product with similanty to Ubiquiliii 

nriinq fiHZVme 


3437 


60.63 


OU*tUO IV 


24903 H 
2 


omo sapiens gamma-glutamyltransrerase i (bu 

3 2 2) 


344 
3687 


100 


30409 ^ 


124903 H 
2 


"omo sapiens gamma-glutamyitransferase 1 (to 

"^2 2) 




98.611 




/I24903 I- 


lomo sapiens gamma-glutamyiiransferase 1 (to 

3 2.2) 


3633 


40.909 




i/l24903 1 


^omo sapiens gamma-glutamyitransferase 1 (fco 
3 2 2) 


46 
1311 


85.156 




^24903 


Homo sapiens gamma-glutamyltransrerase 1 (EC 
?.3.2.2) -— 




95.671 


30413 


^24903 


Homo sapiens gamma-glutamyltransrerase 1 (EC 

? 2 2) — — 


1369 


79.701 


30414 


M24903 


Homo sapiens gamma-glutamyitransferase 1 (to 

' ^ 2.2) ■ —7 


2808 


96.701 


30415 


M24903 


Homo sapiens gamma-glutamyiiransteias« i (lo 

2 3 2 2) 


3357 


82.89 


30416 
30417 


M24903 
U02390 


Homo sapiens gamma-glutamyitransrerase 1 (EC 
2 3 2 2) 

Homo sapiens CAP2 

Homo sapiens CAP2 


2944 

80 
346 


36.364 
59.664 


30418 
30419 
30424 


U02390 
U02390 
AF068864 


U(-imr> canipns (ZMr'2. 

Homo sapiens n2 1 -activated Kinase 3 


2204 
137 
26C 


91.02 
55.263 
78.182 


3042£ 
3042e 
3042" 


AF068864 
AF068864 
^ AF121080 


Homo sapiens p21-activatec3 Kinase i 

Homo sapiens p21 -activated Kinase 6 

■ Mus musculus cAMP inducible i protein 


288: 

5ie 

37f 


87.184 
5 85.106 
3 79.452 


3042J 
3042 
3043 
3043 
3043 
3043 
3043 
3043 
3042 


J AF12108C 
3AF12108C 
3AF12108C 
1 AF12108( 
Z Ar 1 z 1 uo 
3AF12108 

4 AF12108 

5 AF12108 
6U17195 


" Mus musculus cAMP inducible i protein 

" Mus musculus cAMP inducible i protein 

) Mus musculus cAMP inducible i protein 

) Mus musculus cAMP inducible i protein 

D Mus musculus cAMP inducible i protein . 

D Mus musculus cAMP inducible i protein 

0 Mus musculus cAMP inducible i proiein__ 

n Mus musculus cAMP inducible i proiem ^ 

Homo sapiens A-kinase ancnor protein . 


33' 
5 
68 
5 
6 
132 
283 
561 
38^ 


1 87.5 
9 43.75 

3 58.019 

4 25 

J ZO.aOD 

3 53.012 
3 72.205 

2 99.424 
1 99.831 


304C 
305^ 
305/ 
305 
305 
305 
305 


7 U17195 
[7 X98743 
\8 X98743 

50 X98743 

51 X98743 

52 X98743 

53 X98743 


Homo sapiens A-kinase ancnor protein 

Homo sapiens RNA helicase 

Homo sapiens RNA helicase 

" Homo sapiens RNA helicase 

Homo sapiens RNA helicase 

Homo sapiens RNA helicase 

Homo gapifins RNA helicase 


i: 

295 
5 

12 
30 


.8 62.185 
36 94.059 
39 90.517 
75 75.986 
57 21.25 
82 88.694 
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TABLE 20 



SEQ 1 Me 
ID th 
NO . od 


Predict- 
ed beg- e( 
inning n 
nucleo- t 
tide 1 
loca- t 
tion c 
corres- P 
ponding t 
to first a 
amino = 
acid ^ 
residue c 
of amino 
acid 
sequ- 
ence 


-edict- AI 
i end S 
acleo- C 
ide F 
oca- I 
ion N 
orres- S 
ending ^ 
o last n 
mino 3 
cid 

■esidue 
3f amino 
icid 
sequ- 
ence 


.serine, T=Thr^^na, ::™^^-J;^^„,7.po?sible 
nsertion) 


2 A 


239 
1441 


322 
2130 




3 C 

4 A 


36 
109 


236 
300 


T7T*HAPDPRPLY 


5 A 

6 A 

7 B 

8 A 


2 . 

27 

50 
379 


74 
375 

204 
623 


!sPQ*PCQAGVTLSRLQTTNSPRPHSQKGLRGPRTQTLSLTSQPTACSENS«^ 
i GSQPSPKRTLS ^^ppT,AT,M^'--^^^KGCLGVLLENK* 


9 B 

10 A 

11 ? 


185 
29 
663 


366 
308 
1270 


"^^f^^^!yi.v,^..../KPH.RLQPLPSTPPKtSPL 


12 i 


^ 190 


715 


S?"SvraSESHKLLIGV<3PKGISICRDYFSPIHRIAVPWQMATQ 


13 


A 270 


713 




14 


A 1575 


1968 


_„^T,,pr,r.o*ft3WT.TfRPAHPCRDQLGH — - . ^-f" 


15 


A 185 

1 


721 


SS?Xci«?S"«TLPDPK.™.T«KT.»EVQH 

p.,.T,^,.TTvTOTKPfiEITPHIYNYLIF 



791 CIP 
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TABLE 20 



1 






^ 
1 




5818 


A 


2 


1815 


SSISpsSvstspvatsapptlgqpkgvsasqdrkipppigterlarir 


5819 


A 


1 


394 




5820 


A 


2 


1785 


PVGGLLSFNRQHF/SFPHPWXTSASNSCDSPIPSVSS^^ 


5821 
5822 


A 
A 


3 

3448 


125 
3831 


^SwDYRYQLPP- /MP/WFLVETGFHHAAQAGSQTPDLR J 



791CIP 
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TABLE 21 




SEQ ID NOS: 

356-359 460-461 474 507-508 559 5 
675 733-735 780-782 820 821 e-io _ 
X005-1008 1011-1024 1034 1038-1039 10^^ ^^^^ 
1063 1068-1074 1083-1084 1098__^^^^ ^363-1364 
1203 1214 1235 1265 1257 i ^453 1455- 

1373-1374 1402-1403 1412_^4i; ^^^^_^si9 
1457 1462-1470 1491 1497 14| ^^^^^ 1576-1581 
1522-1524 1531-1533 1535 3 1659-1664 

1609-1611 1625-1627 1644 Ib^ 1726-1731 
1668 1702-1703 1707 1710-1711 171 
1774 1781 1789 1800-1808 ^If^ i909-l910 
1853 1857-1864 1876 188^ l|f„ I982-I984 2013- 
1941-1942 1963-1965 1975 2178-2185 
2014 2054-2057 2092 2156 2172 21 ^236 
2194-2196 2200 2202 2203 2216 22^^ ,3,3.2377 

^lll All nil 2407-2408 2450-2451 2481- 
2380-2382 f l^'V'.^^ 2497-2502 2534 2547 2587 
2482 2486 2491-2494 2497 2651 2661 

2622-2625 2627 2635 2642 2644 26 
2690 2703 2706 2717-2720 276 28^^_^^^^ ^^^^ 
2885 2922 2989-2991 2994 3128-3129 3134- 

3036 3096-3097 3112-3115 312° 312^_3,3, 32^3 
3136 3139-3140 3167 3219_^ 3430-3431 3434 
3396-3397 3399-3408 3427 34^ 

3458 3467 3473-3474 3480-3482 349 ^ 
3521 3538-3539 3542-3544 3551 3552 3,^, 
3586-3587 3608-3609 3662-3663 37 
3806-3813 3828-3832 3858 3964 3^^ ^ 4144- 

4005 4012-4013 4047 4101-4104 41^^ 4345-4346 
4145 4171 4246-4248 4285 4339 ^g_^^33 4524. 



1-11 52 64 82-83 1^^ — ; 

264-266 287-292 323-325 345 ^ 
375 391-392 400-402 406 474 50 
578 590-593 598 614-615 624_674 _^ 
701-702 706-708 743-747 768 /b 
826 840-842 844 848-849 867_87i 
902 905 913-917 920-928 941 ^ ^^^3 
960 963-969 977-989 ^91 ^^^34 io38-1040 

1005-1008 1011-1024 1031-1032 1" 
1049-1051 1059-1063 1065-106910 
1090 1094-1098 1100-1101 1103 11 
1123-1124 1128 1146-1148 116/ 226 1234 

1186-1194 1203 1214 1216 1219 1222 1^ 
11237 1242-1247 12ei-12f f f 12|3_l3^, ,3,,. 
I 1309-1311 1314-1320 1339 1401-1404 
1362 1370 1373-1374 1380 138« If 45^ ^473 
1435-1436 1447-1448 1450 1453_1 ^^3^.^333 
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848-849 963 990 1034 1040 1051 11 .^^^^ 
1265-1267 1270-1272 1295-1297 ^^^^ 
1449 1462-1470 1486-1487 1491 1 ^^^^^ ^,^4 
1549 1562-1565 1624 1669_1|71 l^^^_^g^3 
1743-1744 1752 1789 1814 1« 1928-1931 
1851 1876 1883 1885-1886 1901 2168-2175 
1975-1980 2015 2058-2061 2092 21 
2200 2205-2207 2217-2218 2292 2649 
2380-2382 2395-2397 2453 24^2 ^ ^^^^^^^^ 
2674 2765 2818 2892 2895 g.3417 3687 

3031 3053 3111 3116-3117 32^4341^ ^024-4027 
3719-3723 3848-3851 3983 400 ^533. 
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TABLE 22 



SEQ ID A 
NO: Nl 


ZCESSION DI 
□MBER 


1 s: 

ASCRIPTION ^ 

S 

: 1 

eishmania ma] or 


4ITH- % 

\TERMAN 

CORE 

91 3 


IDENTITY 
3.906 


4 U 


J243460 L 
P 

42580 P 
1 
A 


PAPK (17X) ; similar to PBCV-1 ORF 
41R, corresponds to Genbank 
r^nPfision Number U17055 _ 


9 3 


4.694 

J4.783 1 


r \ 


J42580 I 
] 
1 

2 


PMK (nx) ,■ similar to PBCV-1 ORF 
^41R, corresponds to laeiii^ai 
accession NumberJJlTOSg ■ ■ - 


8 1 


26.667 


6 


ij42580 


L PAPK (17X) ; Similar to PBCV-1 ORF 
A41R, corresponds to tienud"'^ 
accession Number U17055 


57 


30.769 


8 


U425B0 


1 PWK (17X) ,• similar to PBCT 1 ORF 
A41R, corresponds to GencanK- 
^w^^^^on Number U170|5 _-— - 


71 


26.562 


10 


U42580 


Paramecium bursaria Chioreiici i.u. 
1 PAPK (17X) ; similar to PBCV-1 ORF 
A41R, corresponds to GenbanJc 
accession NumberJJi7055 - 


68 


29.365 


11 


U42580 


-ii^iS^iHi^iirb^^^i^^^ 
1 PAPK (17X) ; similar to PBCV-1 ORF 
A41R, corresponds to Genbank 
^P^^.,o-ir^n Number U17055 . 


164 


29.240 


12 


U80443 


-^^orhabditis eiei^ns contains 
similarity to a band 4^1-like 
domain; Pfam domain PF00373 
(Band4_l) Score=132.4, E=8.4e-42, 


206 


32.911 


13 

14 

15 

16 

17 

18 
19 
20 
21 
22 
23 
25 

26 

27 

28 


U80443 

AF149422 
AF149422 
AF149422 
AF149422 
AF149422 
AF149422 
AF149422 
AF149422 
AF149422 
AF149422 
L24521 

L24521 

L24521 

1 L24521 


' caenorhabditis elegans contains 
similarity to a b^nd 4^;-;^^ ^ 
domain; Pfam domain PF00373 
(Band4_l) Score=132.4, E=8.4e-42, 

N=l " 

~ Homo sap-ipns linknown 

" Homo sapiens unknovm _ 

~ Homo sapT«=-ns unknovm 

~ Homo sapiens unknown _ 

~ Homo sapiens unknown 

Homo sapipris unknown 

~ Homo sapiens unknown 

~ Homo sapiens unknown _ 

Homo sapiens unknown 

Homo sapiens iHiknown 

-j^5;;;3-i^^IiKr^ansformation-reiaT:e 

P^°ts^B — ; 4.^ane-Fm-mation- relate 

Homo sapiens trans iormaciou 


148 

66 
738 
1229 
76 

58 

60 

102 

67 

66 

4623 
d 98 

d 75 

d 314 

5d 297 


26.744 
78.125 
50.117 
40.541 
36.111 
30.769 
82.353 
20.588 
20.571 
25.256 
50.000 

29.474 

71.765 

63.855 
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5801 
5802 



Homo sapiens unknown protei: 



Homo sapie ns xinknown protein _ 
Homo sapie ns vinknown protein 



90.343 
92.793 



Homo sapiens unknown protein 

Rattus norvegicus transmembrane 
receptor UNC5H2 



2695 
3731 



recep tor ujnv-jxi^ 

P^^^.^in {c-terminal, cl one XEH.8c}_ 



prote in <<_-i,c.lu ..ux^^^. ■ , 

Homo sapiens iruiJ^iTriH^thy 

irin --^^r..^. clone XEH.8c} 

-t: rrrrTrr~wr^;w-iTM-ical protein, 



Homo sapiens hyp^^S^^icaT^ 1 
similar to (AC007017) putative RNA 
helicase A 



Homo sapiens hypotheticarprotein 
similar to (AC007017) putative RNA 

hel ic ase A ; 

Homo sapiens hypothetical protein, 
similar to (AC007017) putative RNA 

helicase A . __- 

Mus musculus subtilism-like 
convertase 



■|^;^;^-ii^Iiiir3^inked f tinopa^hT 
P^^^.Hn (C-terminal, clone XEH.8c} 



protein ^ v.- ^-^^ — ' — — — 

Homo sapiens X-linked retinopathy 

r^teiZi£LteEtnin^^ 
ITomo sapiens KTAA1085 protein 



5816 
5817 



AB029008 
AB029008 
AB029008 
AB029008 



KT7^M085 protein 
TTnmn -nr-— ° KTRM085 protein 
ITnmn -^r'^ ^TRMOSS protein 



2424 
764 



AB029008 
AF010144 



Homo sapiens KIAA1085 pr^ein_ 
Homo sapiens neuronal thread 



3446 
500 



protein AD7C-NTP 
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integration. The term "transfection refers to the taKing 

»c,eotideswhieh.edia.e.heup.a.eofal«edDNAftag»e„t»toace.^^^^^ 

:U.e..ea....o,™UMPsa.ata...se,ue.ceo^^^^^^^^^ 

..p.er-baseasy.en>saescribedbe.ow.T^pres»oeanaa^v,^^ 

::;.a.eof*e.a..erse,»e..Meten,^ea.AsaescHbedaWe,aUMP.n 

inoreasethefrequenoyofuptakeofaliricedmarkersequeBCe. 

Eachof.heabove.ennsis™ean.toe»ompass=a..hat.sdescnbedforeaoh. 

unless the context dictates otherwise. 

52NUCLEIC ACIDS AND PEFTIDESOFTHEINVENTION 

sZlofthenucleic acids andpeptidesofthepresentinvent^on are se^ 

.thei^rUTaHeUelatestheSBQI.^ 
25 inparentapplicationsfromwhichthisapplicationclaimspnonty. 
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File Name 
»nCD 

N/A 
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Patent Application 
Attorney Docket No. 21272-502 



S.002 30.200. SEQ>DNO:.-.3>92 



778 09/347,127 Jul. 2, 1999 
09/905,059 Jul. 12, 2001 



SEQIDNO: 1-94 
SEQlDNO:l-94 



770 09/457 877 Dec. 8, 1999 SEQIDNO: M28 
S.981 Sep. 14.2001 SEQIDNO: 1-128 

782 09/471,275 Dec.23,1999 SEQIDNO: 1-10,451 

09/488725 Jan.21,2000 SEQ ©NO: 1-10289 
r55S7Apr.25.2OOO SEQIDNO:l-10289 

785 09/491,404 Jan.25,2000 SEQIDNO: 1-3796 
09/922:279 Aug. 3, 2001 SEQIDNO: 1-3796 

787 09/496.914 Feb. 23, 2000 SEQ ID NO: 1-3960 
09/560,875 Apr.27.2000 SEQ ID NO: 1-3960 

788 09/515126 Feb. 28, 2000 SEQIDNO: 1-14074 
0^577.409 May. 18.2000 SEQ ID NO: 1-14074 

789 09/519705 Mar. 7.2000 SEQIDNO: 1-6391 
' 09/574,454 May. 19.2000 SEQ ID NO: 1-6391 

700 09/540217 Mar.31.2000 SEQIDNO: 1-30533 
oS67Aug.23,2000 SEQIDNO: 1-30533 

791 09/552,929 Apr. 18.2000 SEQIDNO: 1-5822 
09/770,160 Jan. 26, 2001 SEQ ID NO: 1-5822 



748 SEQ ID NO: 1-45,196 

752 SEQ ID NO: 1-13192 
752 SEQ ID NO: 1-13192 
752 SEQ ID NO: 1-13192 

778 SEQ ID NO: 1-94 

778 SEQ ID NO: 1-94 

779 SEQ ID NO: 1-128 
779 SEQ ID NO: 1-128 

782 SEQ ID NO: 1-10,451 

784 SEQ ID NO: 1-10289 
784 SEQ ID NO: 1-10289 

785 SEQ ID NO: 1-3796 
785 SEQ ID NO: 1-3796 

787 SEQ ID NO: 1-3960 

787 SEQ ID NO: 1-3960 

788 SEQ ID NO: 1-14074 
788 SEQ ID NO: 1-14074 

789 SEQ ID NO: 1-6391 

789 SEQ ID NO: 1-6391 

790 SEQ ID NO: 1-30533 
790 SEQ ID NO: 1-30533 

791 SEQIDNO: 1-5822 
791 SEQIDNO: 1-5822 



N/A 



N/A 



N/A 



77.408 May 



f)m NO: 1-8502 _ 



Table2(782).doc 

Table3(784).doc-,Table4(784).doc 

(Table 5 and 7 are hard copies) 
Table6(785).doc 

Table8(787).doc;Table9(787).doc; 
Tablel0(787).doc 



Tablcll(788).doc;Tablel2(788).doc 
Tablel3(788).doc 

Tablel4(789).doc;Tablel5(789).doc 
Tablcl6(789).doc 

Tablel7(790).doc;Tablel8(790).doc 
Tablel9(790).doc 

Table20{791).doc;Tabk21(791).doc 
Table22(791).doc 

(Table 23j5 Me hard copies) 



7AR ^VO ID NO: 1-45,196, 752 SbQ lu inv^. 1 
the nucleotide sequences of 748 SEQ lUNU 784SEQID 

KO: ,.10.2S9. 7S5 SEQIDNO: '''"^^'^7^33 1-5822, 

789SEOIDNO: 1-6391, 790 SEQ ID NO. 1-30,533, >.! 
::r9;XNO:l.S50.apo..»c,eo..e.^^^^ 
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6.0 EXAMPLES 

Selec^d .bies ^ provided on the CD-KOM. s.ppUed ooncwrenUy ™* ^ 
appUcaaoamentotyofthe — of.heCD.ROMarei„*.dedu,beapanom. 

^Hcaaoaand^— edby.fe.e„ce.TV.ee>ec»nicffle«^e.«he.a^^^^ 
5 Mows Ae first refe«n« to the mble throughout the application. 
6.1 The 748 Gene Family 
Nr,vp.l Contigs 

Tl.enovelcontigsoftheinventio«,wereasse«.bledftomnovelexpr^s«l 

,e<^ee.ag.(EST.s)isola.edbyn>e.hodsdescribedhereit.....SB^^^^^^ 
,0 else,ueneesobtainedfromoneorn,orepublicdatabases.UemsertsfortheoDNA 

Zi"fion.«bichthe„„velBST.wereob.ained«e.arnplif.edwi.PCR.»^ 
printers speoificfor the vector se^nenceswhichflanKtheinser... These santpes we 
Uoln,onn,entbra„es.ainterrogated.i.ho,igonneleotiep.b^^^^^ 
i„encesigna«^s. The Clones were Clustered into groupsofstnttlar or .d^^ 
,5 se;uences.andsinglerepresentativecloneswereselec,ed.^n,each^upfrge^^^ 
se^uencing.The5.se,uenceoffl.eantplifiedinserts.asthendeducedus,ng*ereverse 
ri3se,uLingprin,erh.atypicalSangers^ncir,gprotocol.PCR^oductswe« 
^Ja„dsubiectedtofluoresee„tdyet™inatorcyclese.uench..S»^^^^^^^ 
sequencing wasdoneusinga377AppliedBiosystems(ABI) sequencer to obtam*^^ 
,0 „:elBsi.Thenovelcon.igsoftheinven.on.wereassen.b,ed^nr.henove,ESTs 
and, htsonte cases, sequ^tcesobtainedfro. one or ntore public databases^m^^^ 
se,:encesfortheresultingcontigsfton,.he748genefan.ilyaredes,gna.edas748SEQ 

ID NO; 1-45,196 and are provided in the Sequence Listing. 

25 6.2 The 752 Gene Family 

Novel Contigs , . 

The novel contigs of the invemion. were assembled ftom novel express^! 
.eque„cetags(EST-s)iso,atedbyn.ethodsdescHbedherein^...,SB>0^an^^^^^ 
clse,uencesobtainedftonione„rn,orepublicdambases.ThemsertsibrthecDNA 

30 LesUwhichthenovelESTswereobtainedwereainplified^ithPCRu^^ 
priinersspecificfortheveetorseq»enceswhichflan.thei„ser.s.msesan.pleswere 
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UWesv^e spotted on„y.on».e.bn»em«rs and serened «i*oHgonu.,«,tid.^^^ 

t .,.eni«.c..^d.^a.pU«ed«it.PCRus^pH.e.s^i^^^^ 

Ja»pHfied^v«.a^aeduc«i.ingd.e.ve^M13«P--a^»^ 

Sang, sequencing ^»col. as well as intenud primers in *e fon^rd ^ 
d"toso.e cases RACE (Kando^An^plifieaaonofoDNA Ends, was perf^^^ 
™x.nd .he sec^ence in d.e . di,ee«on. .„ aU e^ aU of a sign^ c,us,e. ^ 
Tuenced to ge„e.,e overiapp^g clones to assemble .he contig. Ctoon>a.g.a„. - 
Z caUed and assembled using a softwa. sui.e ^n, University of Wash»g«.«, Sea^ 
^ ^ apphoadons de.gna.ed PHKBO. PHKAP - ™ ^ ^ 
- r 4i,» Tso apnf familv are designated as 782 SEQ ID Nu. i 
for the resulting contigs for the 782 gene tamiiy are ac ^ ^ . ^ ;„ « 

.0,451 and ate provided in *e attached Sequence Lisdng. inserts was -^J^^^ 
^ical Sanger sequencing protocol, m inserts of the Ubrary wete. antpldied w,th PCR 

nsing 5 primers specific for vector sequences which flank the inserts. 

Lcont!gswereassemb.edusing^ESTsequenceasaseed.Thenarecursrve 

a,goridm>wasused to extend the seedEST into an extended assemblage, by pulhng 

additional sequencesftomdiirerentdatabases(i.e..Hyseq-sda.abaseco„ta.mngE T 

sequences,dbESTversionU4,gbprin4,andUniGeneversionlO.)d.tbelo„g^.*^^ 

asLb.age.ma.gorithm terminated whenthere was no additional sequences ftom*^ 

abovedatabasesthatwouldextenddte assemblage. Inclusionofcomponent sequences 

into the assemblage was based onaBLASTO hit to the ext^tding assemblage wtd, 

BLASTscoregreaterthanSOOandpercentldentitygreaterftanQSy.. 

TTre nearest neighbor result for d>e assembled contig was obtamedbya^STA 

version3searchaga^tOe„peptreleaseU4,usingPAS^Valg„rithm^PAS^^^^ 
irnprovedversionofFASTAalignmentwhich allows in.odonfhunesh.fts.Il>e^^t 

neUresultshowedtheclosesthomologueforeachassemblagef^mOenpept^d 

contains thetranslated amino acid sequencesforwhichthe assemblage e„codes>Tl.e 

nearest nei^bor results for 782 SEQ ID NO: 1-10,451 are shown in Table 2, and 
) identified as Table2(782).doc on the enclosed compact disc. 
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6.6 The 784 Gene Family 

Novel Contigs 

Table 3 (identified as Table3(784).doc on the enclosed CD) sets fo«h *e novel 
predicted polypeptides (mcludmgpro5ei„s)enooded by the novel polynucleotides (7^^ 
5 IDNO- i.lO,289)of.hepresentinvention,^dfl.eirco.Tespondingnucleotidelocahonsto 
each of 748 SEQ ID NO: 1-10^9. Table 3 also indicates the method by which the 
polypeptide was p«dicted. Method A refers to a polypeptide obtained by ..ing a software 
program caUed FASTY (available tan http*6sUubioch.virginia.edu) wtach selects a 
polypeptide based on a comparison of translated novel polynucleotide to known 
,0 polypeptides (W.R. Pearson. Methods in Enzymology. 183: 63-98 (1990), inconx,rated 
herein by reference). Method B ref«s to a polypeptide obtained by using a software 
program called GenScan for human^ertebrate sequences (available from Stanford 
University. Office of Technology Licensing) that predicts the polypeptide based on a 
probabilistic model of gene stiucturtfcompositional properties (C. Burge and S. Karhn. J. 
15 Mol Biol . 268: 78-94 (1997). incorporated herein by reference). Method C refers to a 
polypeptide obtamed by usmg a Hyse, proprietary software program that translates the 
novel polynucleotide and its complementary strand mto six possible amino acid sequences 
(forward and reverse frames) and chooses the polypeptide wiflr the longest open readmg 
fiame When the predicted begimring nucleotide of Table 3 is a higher number than the 
20 predicted end nucleotide of Table 3. d«n the amino acid sequence is derived from the 
complementary stiand of tire indicated SEQ ID m Tlte locations of the p^cted 
beginning and end nucleotides correlate to tite mtcleotide sequence of the indicated SEQ ID 
NO not its complementary strand. 

The isolated polypeptides of tite invention include, but are not limited to, a 
25 polypeptidecomprisinganyoftheaminoacidsequencesse,forminTable3orftoms,x 
ftame translations of 784 SEQ IDNO: 1-10,289; or the corresponding fidl lengfl> or 
matitre protein. One of skill in the art could determine the corresponding ammo ac.d 
s^uence using techmques well known intite art to ti^slate and analyze all pos^blesrK 

frames Polypeptides of ti,e invention also include polypeptides witit biological activity 
30 that are encoded by (a) any of the polynucleotides having a nucleotide sequence se, ford, 
in flie 784 SEQ ID NO: 1-10,289; or (b) polynucleotides that hybridize to the 
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Table 4 (identified as Table4(784).doc on the enclosed CD) shows the various 
tissue sources of the EST sequences from Hyseq's database which were used to assemble 
the contigs or nucleic acids of the present invention (identified by 784 SEQ ID NO: 1- 
10,289). 

5 The nearest neighbor result for the assembled contig was obtained by a FASTA 

version 3 search against Genpept release 114, using FASTXY algorithm. FASTXY is an 
improved version of FASTA alignment which allows in-codon frame shifts. The nearest 
neighbor result showed the closest homologue for each assemblage from Genpept (and 
contains the translated amino acid sequences for which the assemblage encodes). The 
1 0 nearest neighbor results for 784 SEQ ID NO: 1-1 0,289 are shown in the Table 5, infra. 

6.7 The 785 Gene Family 

Novel Nucleic Acid Sequences Obtained From Various Libraries 

A plurality of novel nucleic acids were obtained from cDNA libraries prepared 
1 5 from various human tissues and in some cases isolated from a genomic library derived 
from human chromosome using standard PCR, SBH sequence signature analysis and 
Sanger sequencing techniques. The inserts of the library were amplified with PCR using 
primers specific for the vector sequences which flank the inserts. Clones from cDNA 
libraries were spotted on nylon membrane filters and screened with oligonucleotide 
20 probes (e.g., 7-mers) to obtain signature sequences. The clones were clustered into groups 
of similar or identical sequences. Representative clones were selected for sequencing. 

In some cases, the 5' sequence of the amplified inserts was then deduced using a 
typical Sanger sequencing protocol. PCR products were purified and subjected to 
fluorescent dye terminator cycle sequencing. Single pass gel sequencing was done using 
25 a 377 AppUed Biosystems (ABI) sequencer to obtain the novel nucleic acid sequences. In 
some cases RACE (Random Amplification of cDNA Ends) was performed to further 
extend the sequence in the 5' direction. 

The novel contigs of the invention were assembled from sequences that were 
obtained from a cDNA library by methods described above, and in some cases sequences 
30 obtained from one or more public databases. Chromatograms were base called and 

assembled using a software suite from University of Washington, Seattle containing three 
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applications designated PJRED, PHRAP, and CONSED. The sequences for the resulting 
contigs are designated as 785 SEQ ID NO: 1-3,796 and are provided in the Sequence 
Listing. The contigs were assembled using an EST sequence as a seed. Then a recursive 
algorithm was used to extend the seed EST into an extended assemblage, by pulling 
5 additional sequences from different databases (i.e., Hyseq's database containing EST 
sequences, dbEST version 1 14, gb pri 1 14, and UniGene version 101) that belong to this 
assemblage. The algorithm terminated when there was no additional sequences from the 
above databases that would extend the assemblage. Inclusion of component sequences 
into the assemblage was based on a BLASTN hit to the extending assemblage with 

1 0 BLAST score greater than 300 and percent identity greater than 95%. 

The nearest neighbor result for the assembled contig was obtained by a FASTA 
version 3 search against Genpept release 114, using Fastxy algorithm. Fastxy is an 
improved version of FASTA alignment which allows in-codon frame shifts. The nearest 
neighbor result showed the closest homologue for each assemblage from Genpept (and 

1 5 contains the translated amino acid sequences for which the assemblage encodes). The 
nearest neighbor results for 785 SEQ ID NO: 1-3,796 are shown m Table 6 (identified as 
Table6(785).doc on the enclosed CD), infra. 

The nucleotide sequence within the assembled contigs that codes for signal 
peptide sequences and their cleavage sites can be determined from using Neural network 

20 SignalP V 1 . 1 program (from Center for Biological Sequence Analysis, The Technical 
University of Denmark). The process for identifying prokaryotic and eukaryotic signal 
peptidesand their cleavage sites are also disclosed iby Henrick Nielson, Jacob 
Englebrecht, Soren Brunak, and Gunnar von Heijne in the publication "Identification of 
prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites" Protein 

25 Engineering, vol. 10, no. 1, pp. 1-6 (1997) incorporated herein by reference. A maximum 
S score and a mean S score, as described in the Nielson et. al., reference, are obtained 
from each assembled contig. Table 7 sets forth the nucleotide sequence range for each 
sequence of 785 SEQ ID NO: 1-3,796 that encodes a corresponding forty-five amino acid 
sequence containing the signal peptide sequence and its cleavage site, the maximum S 

30 score and the mean S score obtained for each sequence. Not all forty-five amino acids in 
the sequence may comprise the signal peptide. 
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6 8 The 787 Gene Family 

Table 8 (identified as Table8{787).doc on the enclosed CD) sets forth the novel 
predicted polypeptides (including i«>tdns)encodedby the novel polynueleoddes (7^ 
ID NO- 1-3960) of the present invention, and their corresponding ntKleotide locations to 
each of 787 SEQ ID NO: 1-3960. Table 8 also indicates the method by which the 
polypeptide was predicted. Method A refers to a polypeptide obtained by using a software 
program called FASTY (available fom hHp://6sta.bioch.virginia.edu) whrch selects a 
polypeptide based on a comparison of translated novel polynucleotide to known 
polypeptides (W.R. Pearson, Methods in Enzymology. 183: 63-98 (1990), incorporated 
herein by reference). Method B refers to a polypeptide obtained by using a software 
program called GenScan for human/vertebrate sequences (available ftom Stanford 
University, Office of Technology Licensing) that predicts the polypeptide based on a 
probabilistic model of gene smtcture/compositional properties (C. Burge and S. Karlm, J. 
Mol Biol., 268: 78-94 (1997), incorporated herein by reference). Method C refers to a 
polypeptide obtained by using a Hyseq proprietary software pro^ that translates the 
«,vel polynucleotide and its complementary strand into s« possible amino acid sequences 
(forward and reverse fiames) and chooses the polypeptide with the longest open readmg 
ftame When U>e predicted beginning nucleotide of Table 8 is a higher number than the 
predicted end nucleotide of Table 8, then the amino acid sequence is derived ftom the 
complementary strand of indicated SEQ ID NO. The locations of the predicted 
begtoning and end nucleotides correlate to the nucleotide sequence of the indicated SEQ ID 
NO., not its complementary strand. 

The isolated polypeptides of die invention include, but are not limited to. a 
polypeptide comprising anyofthe amino acid sequences set forth i„Table8or ftom stx 

frame translations of 787 SEQ ID NO: 1-3960; or the corresponding Ml lengdt or matirre 
protein One of skill in the art could determine the correspondmg amino acid sequence 
usmg techniques well known in the art to translate and analyze all possible six ftames. 
Polypeptides of the invention also include polypeptides with biological activity that are 
30 encoded by (a) any of flte polynucleotides having a nucleotide sequence set forth m the 
787 SEQ ID NO: 1-3960; or (b) polynucleotides that hybridize to the complement of the 
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into the assemblage was based on a BLASTN hit to the extending assemblage with 
BLAST score greater than 300 and percent identity greater than 95%. 

Table 9 (identified as Table9(787).doc on the enclosed CD) shows the various 
tissue sources of the EST sequences firom Hyseq's database which were used to assemble 
5 the contigs or nucleic acids of the present invention (identified by 787 SEQ ID NO: 1- 
3960). 

The nearest neighbor result for the assembled contig was obtained by a FASTA 
version 3 search against Genpept release 114, using FASTXY algorithm. FASTXY is an 
improved version of FASTA alignment which allows in-codon firame shifts. The nearest 

10 neighbor result showed the closest homologue for each assemblage fi-om Genpept (and 
contains the translated amino acid sequences for which the assemblage encodes). The 
nearest neighbor results for 787 SEQ ID NO: 1-3960 are shown m the Table 10, 
(identified as Table 10(787).doc on the enclosed CD) mfira. 
6.9 The 788 Gene Family 

15 Table 1 1 (identified as Table 11 (788).doc on the enclosed CD) sets forth the novel 

predicted polypeptides (including proteins) encoded by the novel polynucleotides (788 SEQ 
ID NO: 1-14,074) of the present invention, and their corresponding nucleotide locations to 
each of 788 SEQ ID NO: 1-14,074. Table 11 also indicates the method by which the 
polypeptide was predicted. Method A refers to a polypeptide obtained by using a software 

20 program called FASTY (available firom http.//fasta.bioch. virginia.edu) which selects a 
polypeptide based on a comparison of translated novel polynucleotide to known 
polypeptides (W.R. Pearson, Methods in Enzymology, 183: 63-98 (1990), incorporated 
herein by reference). Method B refers to a polypeptide obtained by using a software 
program called GenScan for human/vertebrate sequences (available from Stanford 

25 University, Office of Technology Licensing) that predicts the polypeptide based on a 
probabiHstic model of gene structure/compositional properties (C. Burge and S. Karlin, J. 
Mol. Biol., 268: 78-94 (1997), incorporated herein by reference). Method C refers to a 
polypeptide obtained by using a Hyseq proprietary software program that translates the 
novel polynucleotide and its complementary strand into six possible amino acid sequences 

30 (forward and reverse fiiames) and chooses the polypeptide with the longest open reading 
firame. When the predicted beginning nucleotide of Table 1 1 is a higher number than the 
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Biosystems (ABI) sequencer to obtain the novel nucleic acid sequences. In some cases 
RACE (Random Amplification of cDNA Ends) was performed to further extend the 
sequence in the 5' direction. 
Novel Contigs 

5 The novel contigs of the invention were assembled from sequences that were 

obtained from a cDNA library by methods described above, and in some cases sequences 
obtained from one or more public databases. The sequences for the resulting contigs are 
designated as 788 SEQ ID NO: 1-14,074 and are provided in the attached Sequence 
Listing; The contigs were assembled using an EST sequence as a seed. Then a recursive 

1 0 algorithm was used to extend the seed EST into an extended assemblage, by pulling 
additional sequences from different databases (i.e., Hyseq's database containing EST 
sequences, dbEST version 114, gb pri 1 14, and UniGene version 101) that belong to this 
assemblage. The algorithm terminated when there was no additional sequences from the 
above databases that would extend the assemblage. Inclusion of component sequences 

1 5 into the assemblage was based on a BLASTN hit to the extending assemblage with 
BLAST score greater than 300 and percent identity greater than 95%. 

Table 12 (identified as Tablel2(788).doc on the enclosed CD) shows the various 
tissue sources of the EST sequences from Hyseq*s database which were used to assemble 
the contigs or nucleic acids of the present invention (identified by 788 SEQ ID NO: 1- 

20 14,074). 

The nearest neighbor result for the assembled contig was obtained by a FASTA 
version 3 search against Genpept release 115, using FASTXY algorithm. FASTXY is an 
improved version of FASTA alignment which allows in-codon frame shifts. The nearest 
neighbor result showed the closest homologue for each assemblage from Genpept (and 
25 contains the translated amino acid sequences for which the assemblage encodes). The 
nearest neighbor results for 788 SEQ ID NO: 1-14,074 are shown in the Table 13, 
(identified as Table 13(788).doc on the enclosed CD) infi^. 

6.10 The 789 Gene Family 
30 Table 14 (identified as Tablel4(789).doc on the enclosed CD) sets forth the novel 

predicted polypeptides (including proteins) encoded by the novel polynucleotides (789 
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SEQ ID NO: 1-6,391) of the present invention, and their corresponding nucleotide 
locations to each of SEQ ID NO: 1-6,391 . Table 14 also indicates the method by which 
the polypeptide was predicted. Method A refers to a polypeptide obtained by using a 
software program called FASTY (available from http://fasta.bioch.virginia.edu') which 
5 selects a polypeptide based on a comparison of translated novel polynucleotide to known 
polypeptides (W.R. Pearson, Methods in Enzymology, 183: 63-98 (1990), incorporated 
herein by reference). Method B refers to a polypeptide obtained by using a software 
program called GenScan for human/vertebrate sequences (available from Stanford 
University, Office of Technology Licensing) that predicts the polypeptide based on a 

10 probabilistic model of gene structure/compositional properties (C. Burge and S. Karlin, J. 
Mol. Biol., 268: 78-94 (1997), incorporated herein by reference). Method C refers to a 
polypeptide obtained by using a Hyseq proprietary software program that translates the 
novel polynucleotide and its complementary strand into six possible amino acid 
sequences (forward and reverse frames) and chooses the polypeptide with the longest 

1 5 open reading frame. 

A plurality of novel nucleic acids were obtained from cDNA libraries prepared 
from various human tissues and in some cases isolated from a genomic library derived 
from human chromosome using standard PGR, SBH sequence signature analysis and 
Sanger sequencing techniques. The inserts of the library were amplified with PGR using 

20 primers specific for the vector sequences which flank the inserts. Glones from cDNA 
libraries were spotted on nylon membrane filters and screened with oligonucleotide 
probes {e.g., 7-mers) to obtain signature sequences. The clones were clustered into groups 
of similar or identical sequences. Representative clones were selected for sequencing. 

In some cases, the 5' sequence of the amplified inserts was then deduced using a 

25 typical Sanger sequencing protocol. PGR products were purified and subjected to 

fluorescent dye terminator cycle sequencing. Single pass gel sequencing was done using 
a 377 Applied Biosystems (ABI) sequencer to obtain the novel nucleic acid sequences. In 
some cases RAGE (Random Amplification of cDNA Ends) was perfomied to fiuther 
extend the sequence in the 5' direction. 

30 Novel Gontigs 
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The novel contigs or the nucleic acids of the present invention of the invention 
were assembled from sequences that were obtained from a cDNA library by methods 
described above, and in some cases sequences obtained from one or more public 
databases. The sequences for the resulting contigs are designated as 789 SEQ ID NO: 1- 
5 6,391 and are provided in the attached Sequence Listing. The contigs were assembled 
using an EST sequence as a seed. Then a recursive algorithm was used to extend the seed 
EST into an extended assemblage, by pulling additional sequences from different 
databases (i.e., Hyseq's database containing EST sequences, dbEST version 1 14, gb pri 
114, and UriGene version 101) that belong to this assemblage. The algorithm terminated 

10 when there was no additional sequences from the above databases that would extend the 
assemblage. Inclusion of component sequences into the assemblage was based on a 
BLASTN hit to the extending assemblage with BLAST score greater than 300 and 
percent identity greater than 95 %. 

Table 15 (identified as Tablel 5(789).doc on the enclosed CD) shows the various 

1 5 tissue sources of the EST sequences from Hyseq's database which were used to assemble 
the contigs or nucleic acids of the present invention (identified by 789 SEQ ID NO: 1- 
6,391). 

The nearest neighbor result far the assembled contig was obtained by a FASTA 
version 3 search against Genpept release 115, using FASTXY algorithm. FASTXY is an 
20 improved version of FASTA alignment which allows in-codon frame shifts. The nearest 
neighbor result showed the closest homologue for each assemblage from Genpept (and 
contains the translated amino acid sequences for which the assemblage encodes). The 
nearest neighbor results for 789 SEQ ID NO: 1-6,391 are shown m the Table 16, 
(identified as Tablel 6(789).doc on the enclosed CD) infira. 

25 

6. 1 1 The 790 Gene Family 

Table 17 (identified as Tablel 7(790).doc on the enclosed CD) sets forth the novel 
predicted polypeptides (including proteins) encoded by the novel polynucleotides (790 
SEQ ID NO: 1-30,553) of the present invention, and their corresponding start and stop 
30 nucleotide location to each of 790 SEQ ID NO: 1 -30,553. Table 1 7 also indicates the 
method by which the polypeptide was predicted. Method A refers to a polypeptide 
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database containing EST sequences, dbEST version 1 15, gb pri 1 15, and UniGene 
version 10.3, and exons from public domain genomic sequences predicted by GenScan) 
that belong to this assemblage. The algorithm terminated when there were no additional 
sequences from the databases that will extend the assemblage. Further, the inclusion of 
5 component sequences into the assemblage was based on a BLASTN hit to the extending 
assemblage with BLAST score greater than 300 and percent identity greater than 95%. 

Table 18 (identified as Table 1 8(790).doc on the enclosed CD) shows the various 
tissue sources of the EST sequences from Hyseq's database which were used to assemble 
the contigs or nucleic acids of the present invention (identified by 790 SEQ ID N0:1- 
10 30,553). 

The nearest neighbor result for the assembled contig was obtained by a FASTA 
version 3 search against Genpept release 1.15, using FASTXY algorithm. FASTXY is an 
improved version of FASTA alignment which allows in-codon frame shifts. The nearest 
neighbor result showed the closest homologue for each assemblage from Genpept (and 
1 5 contains the translated amino acid sequences for which the assemblage encodes). The 
nearest neighbor resuhs for 790 SEQ ID NO: 1-30,553 are shown in the Table 19, 
(identified as Tablel9(790).doc on the enclosed CD) infra. 

6.12 The 791 Gene Family 

20 Table 20 (identified as Table20(79 1 ).doc on the enclosed CD) sets forth the novel 

predicted polypeptides (including proteins) encoded by the novel polynucleotides (791 
SEQ ID NO: 1-5,822) of the present invention, and their corresponding nucleotide 
locations to each of 791 SEQ ID NO: 1-5,822. Table 20 also indicates the method by 
which the polypeptide was predicted. Method A refers to a polypeptide obtained by using 

25 a software program called FASTY (available from http://fasta.bioch.virginia.edu') which 
selects a polypeptide based on a comparison of translated novel polynucleotide to known 
polypeptides (W.R. Pearson, Methods in Enzymology, 183: 63-98 (1990), incorporated 
herein by reference). Method B refers to a polypeptide obtained by using a software 
program called GenScan for human/vertebrate sequences (available from Stanford 

30 University, Office of Technology Licensing) that predicts the polypeptide based on a 

probabilistic model of gene structure/compositional properties (C. Burge and S. Karlin, J. 
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Table 21 (identified as Table21(791).doc on the enclosed CD) shows the various 
tissue sources of the EST sequences jfrom Hyseq's database which were used to assemble 
the contigs or nucleic acids of the present invention (identified by 791 SEQ ID NO: 1- 
5,822). 

5 The nearest neighbor result for the assembled contig was obtained by a FASTA 

version 3 search against Genpept release 115, using FASTXY algorithm. FASTXY is an 
improved version of FASTA alignment which allows in-codon firame shifts. The nearest 
neighbor result showed the closest homologue for each assemblage from Genpept (and 
contains the translated amino acid sequences for which the assemblage encodes). The 
10 nearest neighbor results for 791 SEQ ID NO: 1-5,822 are shown in the Table 22, 
(identified as Table22(791).doc on the enclosed CD) infira. 

6.13 The 792 Gene Family 

Table 23 sets forth the novel predicted polypeptides (including proteins) encoded 

15 by the novel polynucleotides (792 SEQ ID NO: 1 -8,502) of the present invention, and 
their corresponding nucleotide locations to each of 792 SEQ ID NO: 1-8,502. Table 23 
also indicates the method by which the polypeptide was predicted. Method A refers to a 
polypeptide obtained by using a software program called FASTY (available from 
http :// fasta.bioch. Virginia ■ edu ) which selects a polypeptide based on a comparison of 

20 translated novel polynucleotide to known polypeptides (W.R. Pearson, Methods in 

En2ymology, 183: 63-98 (1990), incorporated herein by reference). Method B refers to a 
polypeptide obtained by using a software program called GenScan for human/vertebrate 
sequences (available from Stanford University, Office of Technology Licensing) that 
predicts the polypeptide based on a probabilistic model of gene structure/compositional 

25 properties (C. Burge and S. Karlin, J. Mol. Biol., 268: 78-94 (1997), incorporated herein 
by reference). Method C refers to a polypeptide obtained by using a Hyseq proprietary 
software program that translates the novel polynucleotide and its complementary strand 
into six possible amino acid sequences (forward and reverse frames) and chooses the 
polypeptide with the longest open reading frame. 

30 The isolated polypeptides of the invention include, but are not limited to, a 

polypeptide comprising any of the amino acid sequences set forth in Table 23 or from six 
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